195 research outputs found

    CMInject:Python framework for the numerical simulation of nanoparticle injection pipelines

    Get PDF
    CMInject simulates nanoparticle injection experiments of particles with diameters in the micrometer to nanometer-regime, e.g., for single-particle-imaging experiments. Particle-particle interactions and particle-induced changes in the surrounding fields are disregarded, due to low nanoparticle concentration in these experiments. CMInject's focus lies on the correct modeling of different forces on such particles, such as fluid-dynamics or light-induced interactions, to allow for simulations that further the scientific development of nanoparticle injection pipelines. To provide a usable basis for this framework and allow for a variety of experiments to be simulated, we implemented first specific force models: fluid drag forces, Brownian motion, and photophoretic forces. For verification, we benchmarked a drag-force-based simulation against a nanoparticle focusing experiment. We envision its use and further development by experimentalists, theorists, and software developers. Program summary: Program Title: CMInject CPC Library link to program files: https://doi.org/10.17632/rbpgn4fk3z.1 Developer's repository link: https://github.com/cfel-cmi/cminject Code Ocean capsule: https://codeocean.com/capsule/5146104 Licensing provisions: GPLv3 Programming language: Python 3 Supplementary material: Code to reproduce and analyze simulation results, example input and output data, video files of trajectory movies Nature of problem: Well-defined, reproducible, and interchangeable simulation setups of experimental injection pipelines for biological and artificial nanoparticles, in particular such pipelines that aim to advance the field of single-particle imaging. Solution method: The definition and implementation of an extensible Python 3 framework to model and execute such simulation setups based on object-oriented software design, making use of parallelization facilities and modern numerical integration routines. Additional comments including restrictions and unusual features: Supplementary executable scripts for quantitative and visual analyses of result data are also part of the framework

    DiffPhase: Generative Diffusion-based STFT Phase Retrieval

    Full text link
    Diffusion probabilistic models have been recently used in a variety of tasks, including speech enhancement and synthesis. As a generative approach, diffusion models have been shown to be especially suitable for imputation problems, where missing data is generated based on existing data. Phase retrieval is inherently an imputation problem, where phase information has to be generated based on the given magnitude. In this work we build upon previous work in the speech domain, adapting a speech enhancement diffusion model specifically for STFT phase retrieval. Evaluation using speech quality and intelligibility metrics shows the diffusion approach is well-suited to the phase retrieval task, with performance surpassing both classical and modern methods.Comment: Submitted to ICASSP 202

    DriftRec: Adapting diffusion models to blind JPEG restoration

    Full text link
    In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an L2L_2 regression baseline with the same network architecture and two state-of-the-art techniques for JPEG restoration, we show that our approach can escape the tendency of other methods to generate blurry images, and recovers the distribution of clean images significantly more faithfully. For this, only a dataset of clean/corrupted image pairs and no knowledge about the corruption operation is required, enabling wider applicability to other restoration tasks. In contrast to other conditional and unconditional diffusion models, we utilize the idea that the distributions of clean and corrupted images are much closer to each other than each is to the usual Gaussian prior of the reverse process in diffusion models. Our approach therefore requires only low levels of added noise, and needs comparatively few sampling steps even without further optimizations. We show that DriftRec naturally generalizes to realistic and difficult scenarios such as unaligned double JPEG compression and blind restoration of JPEGs found online, without having encountered such examples during training.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

    Full text link
    Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and thus only at an approximation of the noisy mixture. This results in a discrepancy between the terminating distribution of the forward process and the prior used for solving the reverse process at inference. In this paper, we address this discrepancy and propose a forward process based on a Brownian bridge. We show that such a process leads to a reduction of the mismatch compared to previous diffusion processes. More importantly, we show that our approach improves in objective metrics over the baseline process with only half of the iteration steps and having one hyperparameter less to tune.Comment: 5 pages, 2 figures, Accepted to Interspeech 2022

    A Flexible Online Framework for Projection-Based STFT Phase Retrieval

    Full text link
    Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon. By using the same projection operators as Griffin-Lim, but combining them in innovative ways, these approaches achieve better results in terms of both reconstruction quality and required number of iterations, while retaining a similar computational complexity per iteration. However, like Griffin-Lim, these algorithms operate in an offline manner and thus require an entire spectrogram as input, which is an unrealistic requirement for many real-world speech communication applications. We propose to extend RTISI -- an existing online (frame-by-frame) variant of the Griffin-Lim algorithm -- into a flexible framework that enables straightforward online implementation of any algorithm based on iterative projections. We further employ this framework to implement online variants of the fast Griffin-Lim algorithm, the accelerated Griffin-Lim algorithm, and two algorithms from the optics domain. Evaluation results on speech signals show that, similarly to the offline case, these algorithms can achieve a considerable performance gain compared to RTISI.Comment: Submitted to ICASSP 2

    Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

    Full text link
    Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approaches have recently been shown to narrow this performance gap considerably. In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks. For this, we extend our prior contributions on diffusion-based speech enhancement in the complex time-frequency domain to the task of bandwith extension. We then compare it to a discriminatively trained neural network with the same network architecture on three restoration tasks, namely speech denoising, dereverberation and bandwidth extension. We observe that the generative approach performs globally better than its discriminative counterpart on all tasks, with the strongest benefit for non-additive distortion models, like in dereverberation and bandwidth extension. Code and audio examples can be found online at https://uhh.de/inf-sp-sgmsemultitaskComment: Submitted to ICASSP 202

    Speech Enhancement and Dereverberation with Diffusion-based Generative Models

    Full text link
    In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussian noise but from a mixture of noisy speech and Gaussian noise. This matches our forward process which moves from clean speech to noisy speech by including a drift term. We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates. By adapting the network architecture, we are able to significantly improve the speech enhancement performance, indicating that the network, rather than the formalism, was the main limitation of our original approach. In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models and achieves better generalization when evaluating on a different corpus than used for training. We complement the results with an instrumental evaluation using real-world noisy recordings and a listening experiment, in which our proposed method is rated best. Examining different sampler configurations for solving the reverse process allows us to balance the performance and computational speed of the proposed method. Moreover, we show that the proposed method is also suitable for dereverberation and thus not limited to additive background noise removal. Code and audio examples are available online, see https://github.com/sp-uhh/sgmseComment: Accepted versio

    EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

    Full text link
    Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out datasets and parallel data samples, in this work we specifically focus on more challenging in-the-wild scenarios and do not rely on parallel data. To this end, we propose a diffusion-based generative model for speech emotion conversion, the EmoConv-Diff, that is trained to reconstruct an input utterance while also conditioning on its emotion. Subsequently, at inference, a target emotion embedding is employed to convert the emotion of the input utterance to the given target emotion. As opposed to performing emotion conversion on categorical representations, we use a continuous arousal dimension to represent emotions while also achieving intensity control. We validate the proposed methodology on a large in-the-wild dataset, the MSP-Podcast v1.10. Our results show that the proposed diffusion model is indeed capable of synthesizing speech with a controllable target emotion. Crucially, the proposed approach shows improved performance along the extreme values of arousal and thereby addresses a common challenge in the speech emotion conversion literature.Comment: Submitted to ICASSP 202

    Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

    Full text link
    Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations, thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using SGMs for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.Comment: Submitted to INTERSPEECH 202

    BEAR reveals that increased fidelity variants can successfully reduce the mismatch tolerance of adenine but not cytosine base editors

    Get PDF
    Base editors allow for precision engineering of the genome. Here, the authors present BEAR, a plasmid-based fluorescence assay for the measurement of CBE and ABE activity, to reveal the mechanism underlying their differences and to increase the yield of edited cells with reduced indel background
    • …
    corecore